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PARTIALLY OBSERVED INFORMATION AND INFERENCE 
ABOUT NON-GAUSSIAN MIXED LINEAR MODELS^ 

By Jiming Jiang 

University of California, Davis 

In mixed linear models with nonnormal data, the Gaussian Fisher 
information matrix is called a quasi-information matrix (QUIM). The 
QUIM plays an important role in evaluating the asymptotic covari- 
ance matrix of the estimators of the model parameters, including the 
variance components. Traditionally, there are two ways to estimate 
the information matrix: the estimated information matrix and the 
observed one. Because the analytic form of the QUIM involves pa- 
rameters other than the variance components, for example, the third 
and fourth moments of the random effects, the estimated QUIM is not 
available. On the other hand, because of the dependence and nonnor- 
mality of the data, the observed QUIM is inconsistent. We propose 
an estimator of the QUIM that consists partially of an observed form 
and partially of an estimated one. We show that this estimator is 
consistent and computationally very easy to operate. The method is 
used to derive large sample tests of statistical hypotheses that involve 
the variance components in a non-Gaussian mixed linear model. Fi- 
nite sample performance of the test is studied by simulations and 
compared with the delete-group jackknife method that applies to a 
special case of non-Gaussian mixed linear models. 

1. Introduction. Mixed linear models are widely used in practice, espe- 
cially in situations involving correlated observations. A typical assumption 
regarding these models is that the observations are normally distributed, or, 
equivalently, that the random effects and errors in the model are normal. 
However, as is well known, the normality assumption is likely to be violated. 
For example, Lange and Ryan [19] gave several examples that show that non- 
normality of the random effects is, indeed, encountered in practice. The au- 
thors also developed a method for assessing normality of the random effects. 
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Due to such concerns, some researchers have considered the use of Gaussian 
maximum hkehhood (ML) or restricted maximum hkehhood (REML) esti- 
mators in nonnormal situations; see Richardson and Welsh [22], Jiang [12, 13] 
and Heyde [10, 11], among others. Throughout this paper these estimators 
will be called ML and REML estimators even if normality does not hold. In 
particular, Jiang [12, 13] established consistency and asymptotic normality 
of REML and ML estimators in nonnormal situations under regularity con- 
ditions. Furthermore, Jiang [12] derived the asymptotic covariance matrix 
(ACM) of the REML estimator of the variance components as well as that 
of the ML estimator without assuming normality. Also see [14]. The ACM 
is important for various inferences about the model parameters, including 
interval estimation and hypothesis testing. Unfortunately, the ACM under 
nonnormality involves parameters other than the variance components, for 
example, the third and fourth moments of the random effects. Note that 
standard procedures such as ML and REML do not produce estimators of 
these additional parameters. For years this complication has undermined the 
potential usefulness of the ACM in nonnormal situations. 

To see exactly where the problem occurs, consider the mixed linear model 

(1) y = X(3 + Ziai + --- + Zsas + e, 

where y is an x 1 vector of observations, X, Zi, . . . , are known matri- 
ces, /3 is a p X 1 vector of unknown parameters (the fixed effects), ai, . . . , 
are vectors of random effects and e is a vector of errors. It is assumed that 
as,e are independent. Furthermore, the components of aj are i.i.d. 
with mean and variance 1 < j < s, and the components of e are i.i.d. 
with mean and variance Uq. Without loss of generality, let rank(X) =p. 
Note that normality is not assumed in this model, nor is any other spe- 
cific distribution assumed. Also, w.l.o.g. consider the Hartley-Rao form 
of the variance components [9]: A = do, = (Jj /a^, I < j < s. Let 9 = 
(A,7i,...,7.)'- 

According to [12], the ACM of the REML estimator 6 is given by 

where is the Gaussian restricted log-likelihood function, that is, the log- 
likelihood based on z = T'y, where y satisfies (1) with normally distributed 
random effects and errors, and T is an x (N — p) matrix of full rank 
such that T'X = 0. The matrix I2 = ^{dH^/d9de') depends only on 9, 
whose estimator is already available. However, unlike X2, the matrix Ii = 
Vav{dl^/d9) depends on, in addition to 9, the kurtoses of the random effects 
and errors. Similarly, let ip = {(3' 9')' and let ^ijj be the ML estimator of ^. By 
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the result of Jiang [14], it can be shown that the ACM of ip is given by 

where I is the Gaussian log-Ukehhood. Here, again, the matrix X2 = Fi{d'^l/ 
dil^dif)') depends only on 9, but the matrix Xi = \&T:{dl / difj) depends on, in 
addition to 0, the kurtoses as well as the third moments of random effects 
and errors. 

It is clear that the key issue is how to estimate Ti, which we call the 
quasi-information matrix (QUIM) for an obvious reason. Consider, for ex- 
ample, the ML case. If I were the true log-likelihood, then we would have 
Ii = — X2, which is the Fisher information matrix. Traditionally, there are 
two ways to estimate the Fisher information: (i) the estimated information 
and (ii) the observed information. See, for example, [7] for a discussion and 
comparison of the two methods in the i.i.d. case. It is known that stan- 
dard procedures in mixed model analysis such as ML and REML do not 
produce estimators of the third and fourth moments of the random effects 
and errors. Therefore, according to our previous discussion, method (i) is 
not possible unless one finds some way to estimate these higher moments. 
Assuming that the random effects and errors are symmetrically distributed, 
in which case the third moments vanish, Jiang [15] proposed an empirical 
method of moments (EMM) to estimate the kurtoses of the random effects 
and errors. It is clear that this method has a limitation, because, like nor- 
mality, symmetry may not hold in practice. When the third moments are 
nonzero, the EMM cannot be used. Furthermore, the situation to which the 
EMM applies is somewhat restrictive and requires certain orthogonal de- 
compositions of the linear spaces generated by the design matrices of the 
random effects. Simulation results have suggested that the EMM estimator 
may have large variance even when the sample size is moderately large. As 
for method (ii), it is not all that clear how this should be defined in cases 
of correlated observations. For simplicity, let us assume that ip is a. scalar. 
With independent observations, we have 




where li is the log- likelihood based on yi, the ith observation. Therefore, an 
observed information is Ti = Y^f=i{dli / d^\^)'^ , where 'ij) is the ML estima- 
tor. This is a consistent estimator of Ii in the sense that Ti — Ti = op(Ti) 
or, equivalently, X\IX\ ^ 1 in probability. However, if the observations are 
correlated, (3) does not hold. In this case, since X\ = E{(5//9^)^}, one 
might attempt to define Ii = {dl/dip\j^)'^ . However, this is zero since is 
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the MLE. Even if ^/^ is a different (consistent) estimator, the expression 
is not a consistent estimator. For example, in the independent case this 
is the same as i^f=idli/dil)\^Y , which, asymptoticahy, is equivalent to N 
times the square of a normal random variable. Therefore, it is not true that 
Xi — Xi = op(Xi). Alternatively, if normality holds, one may define /j as the 
logarithm of the conditional density of yi given yi, ■ ■ ■ ,yi-i- It follows that 
dli/dip, 1 <i < N, is a sequence of martingale differences [with respect to 
the <T-fields J-'i = (j{yi, ■ ■ ■ ,yi), 1 < « < N]. Thus, we still have (3) but with 
new definitions of li's; hence Xi can be defined similarly as in the indepen- 
dent case. However, if normality does not hold, this latter strategy also does 
not work (because dli/dtp is no longer a martingale difference). 

We now explain our approach to the problem using a simple example. 

Example 1. Consider the following model with crossed random ef- 
fects: yij = fi + Vi + Wj + eij,i = 1, . . . ,m,j = I, . . . ,n, where fi is an un- 
known mean, Vi and Wj are random effects, and Cij is an error. It is as- 
sumed that the Vi^s are i.i.d. with mean and variance erf, the Wj^s are 
i.i.d. with mean and variance o"!, the e^j's are i.i.d. with mean and 
variance ctq, and v, w and e are independent. Consider an element of the 
QUIM \ai{dlYi/d6) for REML estimation, say, var(5/R/5A), where Zr, is 
the Gaussian restricted log-likelihood and 9 = (A, 71, 72)' (A and 7's as de- 
fined earlier). By the result of Jiang ([16], Section 5, Example 2), it can 
be shown that dl^/dX = {u'Bu — {mn — 1)A}/2A^, where u = y — film ® In 
with y = {yij)i<i<m,i<j<n (as a vector in which the components are ordered 
as yu,...,yin,y2i,---) and 

B = Im(E>In--(l - ) /m ® ^ - — f 1 - -r— ) Jm ® In 



n\ 1 + 'Jin J m V 1 + 72^71 

1 / 1 1 \ 

+ 1 - — — ]Jm<S)Jn 

mn V 1 + JiTi 1 + 72?7i / 

= I-m® In + Al/m ® Jn + A2>/m ® In + ^sJm "X" Jn- 

Hereafter, In and In represent the n-dimensional identity matrix and the 
vector of I's, respectively, J„ = Inl^ and ® means Kronecker product. De- 
fine Ko = E(efi) - 3A2, Ki = E{vf) - SX'^-ff, K2 = E{wf) - SX'^-y^ (note that 
these are the kurtoses) and to = 1 + -^1 + -^2 + A3, ti = {{m — l)n}/{m(l -|- 
7in)}, t2 = {m(n — l)}/n(l -|- 72m)}; mo = mn, mi = m and m2 = n. By 
Lemma 1 in the sequel, it can be shown that 

■dlR\ 
dX 



var 



E<^ (ao + ai + 02) ^ ufj - oi E E " '^2 E E ' 
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(4) 

mn 

+ ■ 



"^{(1 + 71 + 12? - its + U)} ^^'"''"^ + *^i4n) 



where 



Si + S2, 



"° 4A4' 4A4n(n3-l)' 4A4m(m3-l)' 

+ 72 + linf - (1 + 71 + 72)^ 
— 1 ' 

m(l + 71 + 72m)2 _ (1 _^ _^ ^2)^ 



^3 
t4 



— 1 

It is clear that 52 can be estimated by replacing the variance components 
by their REML estimators, which are already available. As for Si, it cannot 
be estimated in the same way for the reason given above. However, the 
form of [cf. with (3)] suggests an "observed" estimator by taking out 
the expectation sign and replacing the parameters involved by their REML 
estimators. In fact, as m,n^ 00, this observed 5i, say, Si, is consistent 
in the sense that Si/Si ^ 1 in probability. It is interesting to note that 
S2 cannot be consistently estimated by an observed form. In conclusion, 
Si cannot be estimated by an estimated form, but can be estimated by an 
observed form; 5*2 can be estimated by an estimated form, but not by an 
observed form. Thus, we have reached a balance. 



We propose to use such a method to estimate the QUIM. Because the 
estimator consists partially of an observed form and partially of an estimated 
one, it is called a partially observed quasi-information matrix (POQUIM). 

One application of POQUIM is to derive robust dispersion tests in mixed 
linear models. A dispersion test is a test of a statistical hypothesis that in- 
volves the variance components. Such tests, exact or asymptotic, are avail- 
able in the literature (e.g., [17, 23]), but only under the normality assump- 
tion. Since the latter is likely to be violated in practice, as a robust approach, 
it is of interest to derive dispersion tests that do not rely on normality. Using 
the results of Jiang [12, 14], it is possible to derive an asymptotic dispersion 
test based on either the REML or the ML estimators without assuming nor- 
mality, provided that the ACM can be consistently estimated. The POQUIM 
will provide such a consistent estimator. 

The rest of the paper is organized as follows. In Section 2 we explain how 
one comes up with the decomposition (4), that is, we derive POQUIM for 
a general non-Gaussian mixed linear model with REML estimation of the 
variance components. Sufficient conditions will be given for the consistency 
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of POQUIM as well as an estimator of the ACM of the REML estimator. 
In Section 3 we use several examples to illustrate the main results of Sec- 
tion 2. In Section 4 we consider POQUIM for ML estimation. In Section 5 
we apply POQUIM to robust dispersion tests in mixed linear models. Some 
simulated examples are considered in Section 6, in which we study the finite 
sample performance of POQUIM in the context of robust dispersion tests 
and compare it with the delete-group jackknife method of Arvesen [1] (also 
see [2]) in a case where the latter applies. In Section 7 we discuss extension of 
POQUIM to quasi-likelihood estimation and remark on other issues. Proofs 
and other technical details are given in Section 8. 

2. POQUIM for REML. The REML case is relatively simple compared 
to ML, because only estimation of the variance components is involved. 
Furthermore, as will be seen, the QUIM in this case does not involve the 
third moments of the random effects and errors. 

Under model (1) and normality, the restricted log- likelihood for estimating 
the variance components A and 7j, 1 < j < s, is 



where 6* = (A,7i, . . . ,7s)', c is a constant, V = Var(y) = X{I + ^iZiZ[ H h 

^sZgZg) (/ is the N X N identity matrix), T is any N x (N — p) matrix 
such that rank(r) = N — p and T'X = (| • | means determinant), and P = 



T{T'VT)~^T' = V'^ -V-^X{X'V-^X)'^X'V~^ (e.g., [23], page 451). If 



normality does not hold, (5) is not the true restricted log-likelihood, but, 
instead, the quasi-restricted log-likelihood. It is shown in Section 8.1 that 
dl^/dOj = u'BjU — bj, < j < s, where Oq = X, Oj = Jj , 1 < j < s; u = y — XP; 
Bo = (2A)-ip, Bj = {X/2)PZjZ'jP, I < j < s; bo = {N - p)/2\ and bj = 
{\/2)tr{PZjZ'j), l<j<s. Note that bj = E{u'Bju), 0<j<s. 

2.1. Derivation. Let Uj = yi — x[f3 be the ith component of n, where 
x[ is the ith row of X. The kurtoses of the random effects and errors are 
defined as Kt = E(ati) ~ ^'^^ — ^(Q^ti) — 3(A74)^, < t < s, where ao = £ 
and 7o = 1. Also, with a slight abuse of the notation, let and zu be the 
iih. row and /th column of Zj, respectively, < t < s, where Zo = I. De- 
fine T{ii,i2) = S?=o7t(-^nt ' ^i2t)- Here the dot product of vectors oi, . . . ,afc 
of the same dimension is defined as ai • a2 • • • = X]/ oi/ffl2« ' "O-ki- Also, let 
rrit be the dimension of aj, < t < s (so that mo = N). We begin with 
an expression for coy {ui-^Ui^^Ui-^Ui^) (1 < ii,...,i4 < A^) as well as one for 
cov{dlB./d9j,dl^/d6k), the {j,k) element of Ti. 

Lemma 1. We have 

COV {Ui^Ui2,Ui^Ui^) 



(5) 



h{e)=c-^{logi\T'VT\)+y'Py}, 



(6) 



s 



A^{r(ii, i3)r(«2, ^4) + r(ii, i4)r(«2, ^3)} + '^t^^i* ■ ■ " ^»4t) 



t=0 
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where Zi^t • • • Zi^t = Zi^t • Zi^t • Zi^t • Zi^f Furthermore, we have 

The proof is given in Section 8.2. 

Let /i, . . . , /l be the different nonzero functional values of 

s 

(8) f{ii,---,i4:)=^i^tZiif-Zi^f 

t=o 

Note that this is the second term on the right-hand side of (6). Here func- 
tional value means /(ii, . . . ,^4) as a function of k = {Kt)o<t<s- For example, 
kq + Ki and K2 + K3 are different functions (even if their values may be the 
same for some k). Also, let denote the zero function (of «;). Then without 
using (7) we have 



«1,...,«4 



f{ii,...M)=0 
L 

'=1 f{h,-M)=fi 

L 

1=0 

with S"/, < I < L, defined in obvious ways. According to Lemma 1, the 
left-hand side of (9) depends on the higher moments only through k. By (6) 
and (8) we have 

(10) 5o = 2A^ Bj^i^^i^Bk^i^^iJ{ii,i-i)T{i2,iA), 

/(n,...,M)=0 

which depends only on 6. Furthermore, for 1 < / < L write 

Sl=Cl ^ COY {Ui^Ui^,Ui^Ui^) 
f{ii,...,ii)=fl 

f{ii,...,ii)=fl 
= 5'; 1 + 5; 2, 
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where q is a constant to be determined later on. By (6) we have 

/(n,...,i4)=/i 

= /; X! i^j,il,i2Bk,i3M- Cl) -\ , 

f{ii,-M)=fl 

where • • • depends only on 6. If we let the coefficient of fi in the above be 
equal to zero, we have 

where | • | denotes cardinality. With this choice of ci, we have 

Si,2 = >? XI {Bj,n,i2Bk,i3,u -Q){r(n,«3)r(«2,^4) +r(ii,i4)r(i2,i3)} 
/(n,...,i4)=/i 

f{ii,...M)=fl 

which depends only on 9. Note that q depends only on 9. On the other hand, 
by the fact that E(nj^nj2) = Ar(zi, ^2) (see the first paragraph of Section 8.2), 
we have 

Si,i = ci {E{ui^---Ui^) - X^T{ii,i2)T{i3,u)} 

/(n,...,M)=/i 



\ /(«!,. ••,M) = /i 



A^Q E T{ii,is)r{i2,i4). 

J{ii,...M)=fl 



Note that ^^(^^^ ^i^)^^^ r(ii, i2)r(i3, ^4) = E/(ii,...,i4)=/, r(^i' ^3)r(i2, u), be- 
cause /(ii, . . . , ^4) is symmetric in ii, . . . , ^4. Therefore, we have, by combin- 
ing the above. 



\ /(«!,. ■■,«4)=/i 



(12) +2A^ Y Bj^i^^i^Bk,i3^i4T{h,h)^{i2,u) 

f{h,...M)=fl 

-3A^Q Y r(ii,i3)r(«2,i4)- 
/(n,...,i4)=/i 

Note that q defined by (11) depends on j and k, that is, q = Cj^k,i- If 
we define Cj,fc(n, • • • ,^4) = Cj,fc,/, if /(ii, . • . ,i4) = /«, 1 < / < -i^, then by (9), 
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(10) and (12) it can be shown that 



cov 



(1^, = e| E ^3^k{k U)ui, ■■■uA+2 tr{B,VB,V) 
^ ' y f{ii,.. .,14)^0 ) 

-3A^ E Cj,fc(ii,...,i4)r(ii,i3)r(i2,i4)- 

/{ji,...,j4)7^0 

We summarize the result in terms of a theorem. Write li^jk = cov{dlYi/d9j, 
dlR/dOk), which is the j, A; element of the QUIM Ji = Vav{dlR/d9). 

Theorem 1. For any non-Gaussian mixed linear model (1), we have 

s nit 

Jijfc = 2iT{BjVBuV) + E Y.^z[iBjZu){z'uBkZu) 

t=0 1=1 



(13) 



+ \2ti{B,VBkV) - 3A2 c,,fc(ii, 
I /(n,...,i4)7^0 



,u)r(n,i3)r(i2,i4) 



/(n,...,i4)7^0 

< j, < s, w/iere Cj,fc(n,. . . ,i4) = Cj,a;,/, if f{ii,-- ■,i4) = fi, 1<1<L, with 

|{/(Z1,...,Z4)-/Z}|^(.^_^^)^^^ 

Of course, (13) can be verified directly, but the derivation above also 
explains where the thought came from. Note that 2tr(BjVBkV) is the 
Gaussian covariance between dl-^/dOj and dl-^/dOk. This means that un- 
der normality Ti^ijk is identical to the second term in Ii^2,jk with the 
negative sign removed. Of course, this can be easily verified using (6). 
On the other hand, without normality ^i^ijk may involve higher moments 
of the random effects and errors, and this is why the expectation is not 
taken inside the summation. Instead, we propose to estimate Ii,i,jk by 
taking out the expectation sign and replacing any parameter involved by 
its REML estimator, that is, ii,ijk = T>f{h,...,u)^QCj,kik, ■ ■ ■ ,i4:)uii ■ ■ -Uu, 
where Cj^k{ii, ■ ■ ■ jii) is defined in the same way as Cj^k{ii, ■ ■ ■ ,^4) except 
with 6 replaced by 6, and Ui = Ui — x'^p. Here 6 is the REML estimator 
of 61, p = {X'V-^X)-^X'V-^y and V is V with 6 replaced by 0. Note 
that the set {(«i, . . . , Z4) : /(«i, . . . , 14) = //} does not depend on 6. It fol- 
lows that Cj^kih, . . . ,^4) = Cj^k,i if fik, • • • ,^4) = //, 1<1<L, where Cj^k,i is 
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given by (14) with B replaced by B. Here Bj^i^^i^ is Bj^i^^i^ with 9 replaced 
by 9, and so forth. This is the observed part. 

On the other hand, Ii,2,jk depends only on 9 and, therefore, can be esti- 
mated by replacing 9 by 9. The result, denoted by 2'i,2jfc5 is the estimated 
part. 

An estimator of Iijk is then ii,ijk + ^i,2jfc; hence an estimator of 2i is 
given by Xi =2'i,i +ii,2, where Ii^r = {ii,r,jk)o<j,k<s, r = 1,2. Because the 
estimator consists partially of an observed form and partially of an estimated 
one, it is called a partially observed quasi-information matrix (POQUIM). 

This is exactly where the decomposition (4) came from. We now use 
another simple example to illustrate the POQUIM decomposition, with more 
examples to come in Section 3. 

Example 2. Consider a one-way random effects model yij = n + ai + 
Eij, i = 1, . . . ,m, j = 1, . . . ,n, where /i is an unknown mean; the random 
effects ai,...,am are i.i.d. with mean and variance af; the errors e^j's 
are i.i.d. with mean and variance ctq; and a and e are independent. It 
is, in this case, more convenient to use a double index (i.e., ij instead of 
i). It is easy to show that /(iiji, . • . , ^4^4) = if not ii = • • ■ = i^; ki if 
ii = • • • = ii but not ji = • • • = ji] and fio + ki if ii = • • • = ^4 and ji = • • • = 
j4^. Thus, L = 2 [note that L is the number of different functional values 
of /(^iji) • • • ) ^^4)]- Define the following functions of 9, where 9 = (A, 71)': 
to = 1 — {71/(1 + 7in)} — 1/(1 -|- 7in)?Tin, ti = {(m — l)n/m(l -|- 7iri)} and 
t3 = {n(l + 7in)2 _ (1 + 7i)2}/(n3 - 1). Then the POQUIM is given by 

iiM=iixkl+iiXkU k,l = 0,l, where 



2^1,1,00 



'1,1,01 



tl 




(m — l)(tin 



'1,1,11 



'1,2,00 



4A'^(1 -|- 'jin)^m{n' 
J {m-l)to ^ 



4A3(1 -I- 7in)2m """^ 



(m-l)2 



4A2(1 -K7in)^m2 

1 

2A2 



mn - 1 - ^mnt^{{l + ■jif - is} - ^mtfis 



2 2 

2i 



^ {m-l)n r /3\ (t,n-to)t3 + (l + 7i)% ] 

2A(l + 7in)l \2j l + 7in J' 
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~ (m — l)(m — 3)n^ 

2l,2,ll = 



4m(l + 7in)^ 

Uij = Vij — y-- - and the t's are the t's with 9 replaced by 9, the REML 
estimator. 



Computational note. The following list outlines a numerical algo- 
rithm for POQUIM: 

1. Determine the sets of indices 5/ = {(zi, . . . , ^4) : /(ii, . . . , 14) = 1 < / < 
L. Then, for each (j, k), < j < k < s, do the following. 

2. Compute Cj^k,h 1 <l < L. Note that the denominator in (14) is 

3. Compute Ti^ijfc = J2f{iu...M)j^o^j,k{h, ■ ■ . ,ii)ui^- ■ -Uu, where Cj^k{h, 

is defined the same way as cj^kih, ■ ■ ■ ,^4) above (14) with 9 replaced by 
9 and Ui=yi- x[i3. Note that ,14)7^0 = E51 + • • • + Hsl ■ 

4. Compute ii^2,jk, which is Ii,2,jk with 9 replaced by 9. See step 3 for the 
summation. 

5. Let lijk =2^i,i,jfc +2^i,2,jfc- 

All except step 1 are fairly straightforward. As for step 1, the sets may be 
determined as follows. First, the index (1, 1, 1, 1) belongs to Si. Also compute 
the vector f 1,1, 1,1 = {zu • zu ■ zu • zit)o<t<s- Then compute the vector f 1,1, 1,2 = 
{zif zif zif Z2t)o<t<s- If ^^1,1, 1,2 = ^^1, 1,1,1, the index (1,1,1,2) belongs to Si; 
otherwise it belongs to ^2, and so on. 

The main theoretical result in this section is the consistency of POQUIM. 
To state the result, we need some additional notation. 

2.2. Notation. For a vector a = (a;), the 1-norm of a is defined by ||a||i = 
J2i {d-il- A sequence of matrices M is bounded from above if ||M|| is bounded; 
the sequence is bounded from below if ||M~^|| is bounded, where the norm 
of a matrix M is defined as ||M|| = {Amax(M'M)}i/2 with A 

max repre- 
senting the largest eigenvalue. A positive definite matrix-valued function 
M{9), which may depend on A^, is said to be uniformly continuous at 9 if 
\\M-^/^{9)M{9 + A)M-i/2(6i) _ /|| ^ as A ^ uniformly in A^, where / 
is an identity matrix of fixed dimension. It is easy to show that M{9) is uni- 
formly continuous if and only if for any r] > 0, there is 5 > such that | A| < 5 
implies (1 - r])M{9) <M{9 + A)<{1 + r])M{9) for ah A^. An estimator M 
of a positive definite matrix M, which may depend on A^, is consistent if 
j\^^-i/2 _ J ^ Q probability, where / is an identity matrix of fixed 
dimension. In the following discussion, all the sequences of numbers (vec- 
tors, matrices) depend on A^, but for notational simplicity the subscript A^ 
is suppressed. For example, in condition (iii) of Theorem 2 below go means 
go,N, et cetera. 
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Recall that z'^f. is the ith row of Zt, < t < s, with Zq = I. Let w[ = 
{z[q, z'j^i, . . . , z'^J . Define di as a vector of the same dimension as Wi such 
that the jth component of di is 1 if the corresponding component of Wi is 
nonzero and the jth component of di is if the corresponding component of 
Wi is zero. Note that di is an indicator of what random effects and errors are 
involved in the expression of yi. Let hi denote the denominator in (14) and let 
/i/^^/j be the cardinality of the set of (ii, . . . ,^3) such that f{ii, • ■ • ,^4) = //i, 

/(is, . . . , is) = fh and {di^-\ hdj^) • (dj^ H hdjg) / 0. Here, recaU that for 

two vectors a = (ai) and b = (bi), the dot product is defined as a-b = J2i <^ih- 

Also recall that Q = 2{ii{BjV BkV)}Q<j^k<s is the Gaussian information 
matrix [see the remark below (14)], that is, Ti = Q under normality. More 
generally, let Q denote Q with 9 replaced by 6* as a function of 6. For any 
(5 > 0, define Qj^ki^) = sup^^g \Qj^k - Qj,k\ , where Mj^k denotes the j, k 

element of a matrix M, and define dj^k,i{5) = sup^gg, \0_g\<:s\cj,k,i — 

where Cj^k,i is defined by (14) and Cj^k,i is Cj^k,i with 9 replaced by 9. 

Finally, recall that the asymptotic covariance matrix of the REML esti- 
mator, 9, is given by (2), that is, = T2^^i^2^ ■> where Ti = Var ((9/^/50) 
is the QUIM defined in Section 1, and I2 = E{d^ 1^^/89 89'). The POQUIM 
estimator of is defined by = T^^TiT^^, where Ii is the POQUIM 
and I2 is the estimated I2 obtained by replacing the variance components 
in I2 by their REML estimators. 

2.3. Consistency. It should be pointed out that the definition of REML 
estimator in non-Gaussian mixed linear models differs slightly according to 
several authors. In [22] the REML estimator is defined as the solution to the 
REML equation; in [12] the REML estimator is defined as the solution to the 
REML equation plus the requirement that it belong to the parameter space; 
in [13] the REML estimator is defined as the maximizer of the Gaussian 
restricted likelihood. In fact, the last showed that, for balanced mixed linear 
models, such a maximizer is a consistent estimator of 9; for an unbalanced 
mixed linear model, it showed that a sieved maximizer is consistent. Note 
that from a practical point of view the sieve puts no restriction on the 
maximization, because the maximizer is always within a sieve that satisfies 
the conditions (of Jiang [13], with a suitable constant). Therefore, in the 
following theorem the REML estimator is understood as the maximizer of 
the Gaussian restricted likelihood in the sense of Jiang [13] (with the sieves 
in the unbalanced case; see above). This eliminates any possible confusion 
as to which solution, or root, to the REML equation to use when there are 
multiple roots (e.g., [23], Section 8.1). 

Theorem 2. Suppose that (i) erf > 0, < var(a^]^) <oo, <t<s; (ii) 
\xi\, ll^itlli, 1 <t < s, 1 <i < N are bounded; (iii) there is a sequence of diag- 
onal matrices G = diag((7o, ■ ■ ■ ,gs) with gj > 0, < j < s, such that G~^QG~^ 
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is bounded from above as well as from below and Amin(^'^ ^X) oo; 
(iv) {9j9k)~^J2i'=ihi\cj^k,i\, < j,k < s, are bounded and (gjOk)"'^ x 
T.h,l2=ihi,h\^jAh(^j,Kh\ ^ 0> < j,A; < s; (v) {gjgk)"^gj,k{S) and 
{gjgk)~^J2i=ihidj^k,i{^) —^0, < j,k < s, uniformly in N as 5 — > 0. Then 
the POQUIM Ii and the POQUIM estimator Sr are both consistent. 

Remark 1. The first part of condition (iii) (regarding Q) is equivalent 
to the AI^ condition of Jiang [12, 13], which, together with erf > 0, < t < s, 
guarantees the consistency of the REML estimator 9. Furthermore, condition 
(iii) ensures the consistency of /3 = {X'V~^X)~^X'V~^y, where y is 1^ with 
9 replaced by 6. Finally, by the proof of Lemma 3 in the sequel, it can be 
shown that the first part of condition (v) [regarding gj^k{^)] is equivalent to 
Q being uniformly continuous at 6. 

The proof of Theorem 2 is given in Section 8.3. 

3. Examples. We now consider some examples and show that the condi- 
tions of Theorem 2 are satisfied in typical situations of non-Gaussian mixed 
linear models. 

3.1. A balanced two-way random effects model. Example 1 was used in 
Section 1 to illustrate the POQUIM method. We now revisit this example 
and verify the conditions of Theorem 2. 

Condition (i) is satisfied if cif > 0, t = 0, 1,2, and < var(t'^), var(i(;f), 
var(efi) < oo. 

Condition (ii) is automatically satisfied, because here Xij = 1 and Zijt, t = 
0, 1, 2, are vectors with one component equal to 1 and the other components 
equal to 0. Note that, as in Example 2, it is more convenient to use a double 
index, ij, instead of i. 

By Jiang [12], condition (iii) is satisfied with go = y/rrm, gi = y/m and 52 = 
^/n if fjf > 0, t = 0, 1,2, and m,n ^ 00. See the remarks below Theorem 2. 
Note that here X'V~^X = mn/X{l + jin + 72m). 

Now consider condition (iv). It is easy to show that f{iiji, . . . , iiji) = if 
not zi = • • • = Z4 or ji = • • • = J4; ki if ii = ■ • • = Z4 but not ji = ■ ■ ■ = ji', 
1^2 if ii = • • • = JA but not «i = • • • = 24; and kq + ki + K2 if i\ = ■ ■ ■ = 
14 and ji = • • • = J4. Thus, L = 3. It is easy to show that hi = mnin? — 
1), /i2 = nm{m? - 1), h^ = mn; |co,o,i| oc n~^, |co,o,2| <^ m~^, |co,o,3| oc 1; 
|co,i,i| oc |co,i,2| oc m~^n"2, [00,1,3! oc n"^; |co,2,i| oc n~^m~^, |co,2,2| tx 
m~'^, |co,2,3| ocm~^; |ci,i,i[ ocn~^, |ci,i,2| oc |ci,i,3[ ocn"^; |ci,2,i| oc 

m~^n~^, |ci,2,2| oc n"^m"^, |ci,2,3| oc m~^n~^; |c2,2,i| oc |c2,2,2| oc 

and 1 02,2,3! oc m~^. It follows that the first part of condition (iv) is 
satisfied as m,n^ 00. 
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Furthermore, it is easy to show that hi^i oc mn'^{m + n), hi^2 m^n^(m + 
n), /ii^3 oc mn^{m + n), /i2,2 oc m7n{m + n), /i2,3 oc m^n{m + n) and /i3^3 oc 
mn X (m + n). It fohows that hi^^ i^ < c(m~^ + n~^)hi^hi^, 1 < ^1,^2 !^ 3. 
Therefore, we have 

I] ^/i,/2|cj,fc,ZiCj,fc,«2| <c(';^ + i)< (5j5fe)~^^/i/|cj,fc,/ 

/l,/2 = l ^ ^ I /=i 

as m, n — > 00, < j. A; < 2, using the already verified first part. 

As for condition (v), it is easy to show that the derivatives of Cj^k,i with 
respect to are bounded by quantities of the same order as |cj^fc,«|- It follows 
that dj^k,i{S) is bounded by 6 times a quantity of the same order as |cj^fc,«|- 
Thus, the second part of condition (v) is satisfied by the verified first part 
of condition (iv). By a similar argument, the first part of condition (v) is 
satisfied. 

In conclusion, all the conditions of Theorem 2 are satisfied with go = y/mn, 
9i = ^-iid (72 = y/n, provided that erf > 0, t = 0, 1, 2, < var(uf ), var(ii;f ), 
vav{eii) < 00 and m,n^ 00. 

3.2. A balanced two-way mixed effects model. In the previous example 
the only fixed effect is an unknown mean fi. This time we consider a model 
that involves more fixed effects. We assume that Uij = Pj + + Eij, i = 

1.. ..,m, j = 1, . . . ,n, where the /3j's are unknown fixed effects, Oj's are 

1.1. d. random effects with mean and variance fif , e^j's are i.i.d. errors with 
mean and variance a^, and a and e are independent. Again we verify the 
conditions of Theorem 2. 

Condition (i) holds if erf > 0, t = 0, 1, and < var(af ), var(ef]^) < 00. 
Condition (ii) is automatically satisfied. 

By Jiang [12], condition (iii) is satisfied with = ^mn and g\ = \fm 
as long as crf>0, i = 0,l, m ^ co and n > 2. Note that this result holds 
regardless of n — > 00 or not. 

Now consider (iv). It is easy to show that /(iiji, . . . , i4i4) = if not 
ii = • • • = U; Ki if ii = • • • = ^4 but not ji = ■ ■ ■ = J4; and hq + Ki \i ii = ■ ■ ■ = 

and ji = • • • = ji- Thus L = 2. It is easy to verify that hi = mn{n^ — 1) 
and /i2 = mn. Furthermore, we have |co,o,i| ocn~^, |co,o,2| oc 1, |co.i,i| ocn~^, 
1^0,1,2! oc n"^, |ci^i^i| oc and |ci^i^2| oc It follows that the first part of 
condition (iv) is satisfied as m — > 00. Also, we have hi^i oc mn^, hi^2 oc mn^ 
and /i2,2 oc mn^; hence /i/i,/2 ^ cm~^hi-^hi^, 1 < /i,/2 ^ 2. Thus, for the same 
reason as in the previous subsection, the second part of condition (iv) is 
satisfied as m — > 00. 

By similar arguments as in the previous subsection, condition (v) is sat- 
isfied. 

In conclusion, all the conditions of Theorem 2 are satisfied with go = ^/mn 
and gi = ^/rn, provided that erf > 0, t = 1,2, < var(af), var(ef]^) < 00, 
m — > 00 and n > 2. 
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3.3. An unbalanced nested error regression model. In the previous ex- 
amples the data are balanced in the sense that there are equal numbers of 
observations per cell (e.g., [23], Chapter 4). In this subsection we consider 
an unbalanced case. The model may be viewed as an extension of Example 2 
in Section 2, which can be expressed as yij = x'^^fi + ai + Eij, i = 1, . . . , m, 
j = 1, . . . ,ni, where (n^ > 1) is the size of the ith cluster, Xij is a vector 
of known covariates, /? is a |3-dimensional vector of unknown regression co- 
efficients, Oj's are i.i.d. random eff^ects with mean and variance af, e^j's 
are i.i.d. errors with mean and variance cJq, and a and s are independent. 
When Xij = 1, (5 = jJ, and Ui = n, 1 <i <m, the model reduces to Exam- 
ple 2 of Section 2. Such a model is useful in a number of application areas, 
including small area estimation (e.g., [3, 8]). Here, once again, we verify the 
conditions of Theorem 2. 

Condition (i) is satisfied provided that > 0, r = 0, 1, and < var(af ), 
vaT{sii) < oo. Condition (ii) is satisfied if \xij\ is bounded. By Jiang [12] 
it can be shown that condition (iii) is satisfied with go = \/iV, where N = 
Si^i is the total sample size and gi = y/m, provided that m ^ oo, p is 
bounded, limsup(m/A^) < 1 and 



liminf 





>0, 



(15) 

where Xj. = n^^ Si=i ^o^^ that limsup(?7i/A^) < 1 ensures that, asymp- 
totically, the random effects and errors can be separated, that is, the variance 
components are asymptotically identifiable [12]. Although condition (15) can 
be further weakened, it is more intuitive and satisfied in most cases. 

Now consider condition (iv). The function /(iiji, . . . , ^4^4) has the same 
expression as in Example 2 of Section 2. Thus we have L = 2, hi = J27^i ^i(^f " 
1) and /i2 = N. Furthermore, it can be shown that the iiji, 12 ji element of 
Bq is 

R -J_/l 71 -, 

-°o,?.iii,i2j2 \ Ji=i2) ij^^^^^^ -^(«i=«2) 



«2 



^heie D = YH^iX[DiXi withXj = (x-j)i<j<„- and A = Im-ni'^ + lini) ^Jm- 
Similarly, the iiji, 1232 element of Bi is 



„ _ 1 f l(n=»2) 

^i,nji,«2i2 ~ 



2A I (1 + 7iniJ(l -^7171^2 

D~^x 



^il Tl"'j2 - Vn-1; 



(l + 7inijH l + 7i?^*2 
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/ 71 - Vn-inn-i/^ 7i'^»2 - 

where Q = Z]i^i{^i/(l + 7i^'t)}^^t-^i-- Thus, it can be shown that |co,o,i| oc 
^/YliLi^iinf - 1), |co,o,2| oc 1, |co,i,i|, |ci,i,i| oc rn/J2'j^^ni{nf - 1) and 
|co,i,2|; |ci,i,2| OC m/N . It follows that the first part of condition (iv) is satis- 
fied. Furthermore, it can be shown that \hi^^i^\ < c/iz./i/j X^Eli <''^''V 
{T.T=i nT) X {YT=i ^1, ^2 = 1, 2, provided that limsup(m/iV) < 1, where 
c is a constant and = (3 — Ir)'^ ■, r = 1,2. Note that here we use the fact 
that, by Holder's inequality and the fact that nj > 1, it can be shown that 
{YT=ini)/{YT=int) < {m/Nf/^, which implies that hio^YXLinf, because 
limsup(m/iV) < 1. Thus, by the first part of condition (iv), the second part 
of condition (iv) is satisfied, provided that 

Note that, for example, if the n^'s are bounded, then the left-hand side 
of (16) is 0(?n~i). 

Finally, condition (v) can be verified using the same arguments as in part 
(v) of the previous two subsections. 

In conclusion, all the conditions of Theorem 2 are satisfied with = ^/N 
and gi = y/m, provided that cr^ > 0, r = 0, 1, < var(af), var(e^;^) < oo, 
m oo, p is bounded, limsup(m/A^) < 1, and (15) and (16) hold. Note 
that the conditions do not include that ^ oo. In fact, in most practical 
situations the n^'s are small (e.g., Ghosh and Rao [8]). 



3.4. A random intercept/slope model. So far in the examples the number 
of different functional values for f{ii, . . . ,14), defined by (8), is bounded, that 
is, L is bounded. We now consider a case in which L increases with A^. 

Suppose that two measures are collected from each of m patients, once 
before and once after a surgery, but, because of the availability of patients, 
the measures are made at different times after the time of the surgery. It is 
thought that the recovery is a linear function of time, but the slope depends 
on the individual patient. For the ith patient, the baseline measure made 
before the surgery can be expressed as /3o + Oj , where (3q is an unknown mean 
and ttj is a random effect; after the surgery, a measure is collected at time tj 
from the surgery and the improvement can be expressed as {Pi + bi)ti on top 
of the baseline, where Pi is an unknown parameter and bi is another random 
effect. Of course, each time there is a random measurement error. This model 
can be expressed as yi = Po + ai + Ci, ym+i = Po + PiU + ai + biU + e^+i, 
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i = 1, . . . ,m, where yi and ym+i correspond to the measurements from the 
ith patient before and after the sm'gery. It is assumed that the a^'s are i.i.d. 
with mean and variance af, the bi's are i.i.d. with mean and variance o"!, 
the ej's are i.i.d. with mean and variance (Tq, and a, b, e are independent 
(see the discussion in Section 7). 

Now consider the conditions of Theorem 2. Condition (i) is satisfied if 
cTf > 0, t = 0,1,2, and < var(af), var(6f), var(ef) < oo. Condition (ii) is 
satisfied provided that the tj's are bounded. By Jiang [12], it can be shown 
that condition (iii) is satisfied with gj = ^/m, j = 0,1,2, provided that m — > 
oo and the tj's are bounded from above and away from zero. 

Now consider condition (iv). For 1 < i < = 2m, write i = 2(1 — 1) + r, 
where 1 <l <m and r = 1,2. Then it can be shown that the ii,i2 element 
of Bj can be expressed as 0(l)l(/^=/2) + 0{m~^), j = 0, 1, 2. For simphcity, 
assume that the tj's are ah different. Then it is easy to see that /(ii, . . . , 24) = 
if not h = ■ ■ ■ = Ki if /i = • • • = /4 but not ri = ■ ■ ■ = r^; kq + ki if li = 

• • • = Z4 and ri = • • • = r4 = 1; and ko + Ki+ tfK2 \.ili = ■ ■ ■ = 1^ = 1 and ri = 

• • • = r4 = 2, 1 <l <m. Thus, we have, in particular, L = m + 2. It is easy to 
show that hi = 14m, /i2 = m and h2+i = 1, 1 <l <m. Furthermore, we have 
\cj,k,i\ = 0{1), l<l<m + 2. It follows that (5j5'fc)~^{^i|cj,fe,i| + /i2|cj,fc,2| + 
J2h=i h2+i\cj,k,2+i\} = m~^0{m) = 0(1); hence the first part of condition (iv) 
is satisfied. Similarly, we have hi^i = 0{m), hi^2 = 0{m), /ii,2+« = 0(1)) 1 < 
/ < m, /i2,2 = m, h2,2+l = 1, 1 < / < m, and /i2+/,2+/' = !{;=/'), I < hi' < 
m. Thus, we have {gjgk)~'^Ylh,i2=ihh,h\cj,k,hCj^k,i2\ = 0{m~^); hence the 
second part of condition (iv) is satisfied. By similar arguments as in the 
previous examples, condition (v) is satisfied. 

In conclusion, all the conditions of Theorem 2 are satisfied with gj = 
j = 0, 1,2, provided that erf > 0, t = 0, 1, 2, < var(af ), var(6f ), var(ef ) < 00, 
m — > 00, and the tj's are bounded from above and away from zero, and are 
different. The last condition that the tj's are all different is only for technical 
convenience (otherwise L may be less than m + 2, but the conditions can be 
verified similarly). 

4. POQUIM for ML. In this section we derive POQUIM for ML esti- 
mation. Under model (1) and normality, the log-likelihood for estimating /3 
and 9 is given by 

(17) l{(3,e) = c - Ulog{\V\) + {y- X(3yv-\y - X(3)}, 

where c is a constant. If normality does not hold, (17) is considered the 
quasi-log-likelihood. It is easy to show that dl/dj3 = X'V~^u, and dl/dOj = 
u'CjU -Cj,0<j<s, where Cq = {2\)-^V-^, Cj = {\/2)V-^ ZjZ'-V-^ , 1 < 
j < 1, Co = N/2\, Cj = {X/2)tv{V~^ZjZ'j), l<j<s, and again, u = y- X[5. 
Note that Cj = E{u'Cju), <j < s. Let V"^Xj = qj = {qj,i)i<i<N, where Xj 
is the jth column of X . 
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Using the expression (17), it can be shown that 

Next, similar to Lemma 1, the fohowing equations can be easily derived. 
Lemma 2. We have 

s 

(18) co\ {ui^,Ui^Ui^) =^E(a^i)zijt • Zi^t ■ Zi^t, 

t=o 

Write t{ii,i2,i3) = ^{ui-^Ui^Ui^) , which is the right-hand side of (18). Let 
ti, 1 < ^ < -fi', be the different functional values of t{ii,i2,iz) [as functions of 
the third moments; see (18)]. Then, by similar arguments as in the previous 
section, it can be shown that 



cov 



dl dl 



U(ji,i2,i3)7^0 J 



*(«l,«2,«3)7^0 

where cij^kik, 12,13) = cij,k,i if i(n,*2,'i3) =ti, 1<1<K, with 

Furthermore, recah that f{ii, . . . ,^4) is defined by (8). Then, similar to the 
previous section, define C2j^kih, . . . ,u) = C2j,fc,i if fih, ■ ■ ■ ,14,) = fi, 1 < I < 
L, with 

Then we have similar expressions for cov{dl/d0j,dl/d6k) (with the only 
difference from Section 2.1 being that B is replaced by C). We summarize 
the results as foUows. As before, write ip = {(3' 6')' and, again, write the 
QUIM as li = Varidl/dip) = {'Iijk)i<j,k<p+s+i- 



Theorem 3. For any non-Gaussian mixed linear model (1), we have 
(21) Iijk = Xjy ^Xk=Ti^2,jk, 
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that is, Ii,i,jk = 0, 1 < J, ^ < p; 



t=0 1=1 



(22) 



U{ji,j2,j3)7^0 ) 



that is, Ii^2,j{p+k+i) = Oj 1 < i < < /c < s; and 

s mt 

Ti,{p+j+i){p+k+i) = 2tv{CjVCkV) + ^Kt'^{z[iCjZti){z[iCkZti) 

t=0 1=1 

X! C2,j,k{ii,---,ii)Uii---Ui4\ 
(23) +\2tiiCjVCkV) 



3A^ C2j,fc(«i,---,u)r(ii,i3)r(i2,i4) > 

f{h,...M)j^O ) 



— ^l,l,ip+j+l)ip+k+l) +2^1,2,(p+j+l)(p+fc+l), 

0<j,k<s, where cij^kiii, 12,13) and C2,j^k{ii, ■ ■ ■ ,14,) are defined by (19) and 

(20) , respectively. 

Similar to Section 2, the POQUIM is given by Ti = {Tijk)i<j.k<p+s+i, 
where Iijk =ii,i,jk +ii,2,jk, ii,i,jk is the observed part obtained by taking 
the expectation sign out of Ii^ijk, if there is one, and then replacing the 
parameters involved by their ML estimators; and Ii,2,jk is the estimated 
part obtained by replacing 9 hy 9, the ML estimator, in Ii^2,jki if the latter 
is nonzero. Let ip = {(3' 9')' be the ML estimator of ip. Then, according to 
the discussion in Section 1, the ACM of "ij) is given by S =T^^XiT^^, where 
I2 = ^{dH/di^dip'). Thus, the POQUIM estimator of S is given by S = 
X2^^XiX2^^, where Xi is the POQUIM and X2 is X2 with 9 replaced by 9. 
Similar to Theorem 2, sufficient conditions can be given for the consistency 
of Xi and S. The details are omitted. 

We now use a simple example to illustrate the POQUIM for ML given by 

(21) -(23). 
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Example 2 (continued). Here p = s = 1. It is easy to show that Ii^n 
mn/X{l + 7in), 

/ \ 3 



1,12 



2A3(l+7in)2 _ 



1-71 



n - 



+ <^ 7in + 



(1 -7i)n 
n+ 1 



E 



and ii,i3 = {1/2X^(1 + jin)^}Y.iiY.jUijf, where A, 71 are the ML esti- 
mators. Furthermore, we have ii,(j+2)(fc+2) = ^i,i,(j+2)(/fc+2) +^i,2,(j+2){fc+2), 
j,k = 0,1, where 

/ \ 4 \ 



1,22 



2^1,1,23 



'1,2,22 



2,23 



1,33 



n{n - (7in+ 1 - 71)^} 
4A4(l + 7in)2n(n3 - 1) 

_^ (7in + l-7i)^ .4 
4A4(l + 7in)2 ^ 

71+1-71 

4A3(1 + 7in)3(n2 + n + 1) 
7in + 1 - 71 



EE 



Ui 



E 



E E^ 



E^ 



+ 



4A3(1 + 7in)3 



E4' 



2A2 



3^n(l + 7in)2-(l + 7i)2 



1 



7in + 1 - 71 
1 + 71 



n 



(1 +7in) 



7in + 1 - 71 



(I+71)' 



mn 



2A(l + 7in) 



1 +7in 

3\ (n + 1 - 7i){n(l + 7in)2 _ + ^^)2| 



(1 + 7in)2(n2 +77,+ 1) 

_ /3\ (7171+1 -7i)(l- 
(l+7iri)2 



\2n 



4A2(l + 7in)4 



EE 



Ui 



and Ti 



TTiTl 



2,33 



4(l + 7in)2' 



5. Robust dispersion tests. In this section we consider an application of 
the results on POQUIM to robust dispersion tests in mixed linear models. 
The tests considered here are robust in the sense that they do not require 
normality. A dispersion test may be regarding both the fixed effects and the 



MIXED LINEAR MODELS 



21 



variance components or only the variance components, and both ML and 
REML estimators may be used in such a test. To be more specific, here we 
consider dispersion tests regarding only the variance components based on 
the REML estimators. 

Consider the following general hypothesis regarding 9 in model (1): 

(24) Ho : K'e = 99, 

where is a specified vector and i^' is a known (s + 1) x r matrix with 
rank(J^) = r. We assume that the REML estimator 6 is asymptotically nor- 
mal with mean and ACM Sr, that is, 

(25) - ^) — ' ^(0> ^^+1) in distribution. 

Sufficient conditions for (25) can be found in, for example, [12]. It is then 
easy to show that, under the null hypothesis (24), 

(26) {K'e - ip)'{K"LR,K)-^{K'9 - ip) — > xl in distribution. 

We then replace by its POQUIM estimator of Section 2 to obtain 
the test statistic 

(27) = {K'e - p,y{K'tnK)-\K'e - (^). 

The following theorem states that has the same asymptotic null distri- 
bution as (26). 

Theorem 4. Suppose that the conditions of Theorem 2 are satisfied. 
Furthermore suppose that (25) holds. Then, under the null hypothesis, — > 
Xr in distribution. 

In cases where some components of e are specified under the null hypoth- 
esis, it is customary to use these specified values, instead of the estimators, 
in the POQUIM estimator. Under the null hypothesis this may improve the 
accuracy of the POQUIM estimator, although the difference is expected to 
be small in large samples (because of the consistency of 6; e.g., [12]). It is 
easy to see, by examing the proofs of Theorems 2 and 4, that the same con- 
clusion of Theorem 4 holds after such a modification. (Note that the only 
property of e used in the proof of Theorem 2 is its consistency.) We consider 
a simple example. 

Example 2 (continued). Suppose that one wishes to test the hypothesis 
Hq: 71 = 1, that is, the variance contribution due to the random effects is the 
same as that due to the errors. Note that in this case e = (A, 71)', so the null 
hypothesis corresponds to (24) with K = (0, 1)' and ip = 1. Furthermore, we 
have K'Y^YiK = Sr^h, which is the asymptotic variance of 71, the REML 
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estimator of 71. Thus, the test statistic is = (71 — 1)^/Sr^ii, where Sr,ii 
is the POQUIM estimator of Sr^h (see Section 2). It is easy to show that 

, , ' ^Ti^iiJl 00 - 22'i^oi^2,oo2^2,oi +^1,00^2,01 
(28) ER,n = -p-- -^-^ , 

(,-'-2,00-^-2,11 --^■2,01) 

where Iijk = 1iA,jk +Ii,2,jk, j,k = 0,1, and Ii,r,jk, r = 1,2, are given in 
Example 2 in Section 2, but with 71 replaced by 1, its value under Hq; 
furthermore, we have 12,00 = —{mn— 1)/2A^, X2,oi = —{m — l)n/2A(l + 7in) 
and 22,11 = —{rn — l)n^/2(l + 7in)^, again with 71 replaced by 1, where A 
is the REML estimator of A. The asymptotic null distribution is Xi- t^is 
next section the finite sample performance of this test will be investigated. 

6. Simulations. In this section we consider two simulated examples. The 
goal is to study the finite sample performance of POQUIM, whose large 
sample properties were studied in Section 2 (Theorem 2) and later in Sec- 
tion 4 in the context of the robust dispersion test (Theorem 4). The latter 
will be the focus of our simulation study. 

The first example is the one-way random effects model considered in Ex- 
ample 2. Note that this model is a special case of the unbalanced nested 
error regression model of Section 3.3. However, by restricting to the bal- 
anced case we are able to make a direct comparison with the delete-group 
jackknife method [1, 2]. 

The second example is a balanced two-way random effects model. Note 
that the jackknife method does not apply to this case. In fact, when the 
random effects and errors are not normal or symmetric, POQUIM is the 
only method that is known to apply, by Section 2, at least in large samples. 
Now our goal is to investigate its finite sample performance. 

6.1. A balanced one-way random effects model. Consider once again Ex- 
ample 2 (continued) in Section 5, where the hypothesis to be tested is Hq: 
7i = 1. For example, such a test may be of genetic interest, which corre- 
sponds to Hq: h? = 2, where h? = 4cTf/((TQ + 0"^) is the heritability. We con- 
sider a test based on REML estimation of the variance components. More 
specifically, we are interested in the situation when m is increasing while n 
remains fixed. Therefore, the following sample size configurations are consid- 
ered: Case I, m = 50, n = 2; Case II, m = 400, n = 2. Case I represents a mod- 
erate sample size, while Case II represents a large sample size. In addition, 
we would like to investigate different cases in which normality and symmetry 
may or may not hold. Therefore, the following combinations of distributions 
for the random effects and errors are considered: Case i, Normal-Normal; 
Case ii, DE-NM(— 2, 2, 0.5), where DE represents the double exponential dis- 
tribution and NM(/xi, /i2, p) denotes the mixture of two normal distributions 
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with means fii, 1^2, variance 1 and mixing probability p [i.e., the probabil- 
ities 1 — p and p correspond to N{pi, 1) and N{p2, l)i resp.]; and Case iii, 
CE-NM(— 4, 1,0.2), where CE represents the centralized exponential distri- 
bution, that is, the distribution of X — 1, where X ~ Exponential(l). Note 
that in Case ii the distributions are not normal but symmetric, while in 
Case iii the distributions are not even symmetric — a further departure from 
normality. Also note that all these distributions have mean 0. They are 
standardized so that the distributions of the random effects and errors have 
variances erf and ctq, respectively. The true value of p is set to 1.0. The true 
value of o"o is also chosen as 1.0. 

According to Section 5, the x^-test statistic is given by 



.2 _ (71 - 1)^ 



(29) X 

^R,ll 

where 71 is the REML estimator of 71, Sr^h is given by (28) and 

(30) A = ^-fsSE+^^^ 



mn — 1 V n + 1 

Here 

m n 

SSE = ^^(y,,-y,.)^ 

i=ij=i 

and 



SSA = n^(?/i. 

1=1 

with yi. = n~'^YJj=iyij and y.. = (mn)'^ J2'iLiJ2]=iyij ■ Note that (30) is 
the REML estimator of A under the null. 

Arvesen [1] proposed a delete-group jackknife method and established 
consistency, using [/-statistics. Furthermore, Arvesen and Schmitz [2] pro- 
vided simulation results. The delete-group jackknife applies to cases where 
data can be divided into i.i.d. groups, such as the current situation. We re- 
fer to this method as jackknife. The method is briefly described as follows. 
Let Xi, . . . ,Xm be i.i.d. observations and let 6 be an unknown parameter. 
Let 6 be an estimator of 9 based on all the observations and let be the 
estimator based on all but the ith. observation, obtained otherwise the same 
way as 6. Define Oi = mO — {m — 1)6 -i, 1 < i < m. The jackknife estimator 
of 6 is defined as ^jack = rn~^ J2^i ^i; that is, the average of the 6*4 's. 

Now consider the one-way random effects model of Example 2. Instead 
of deleting the ith observation, one deletes the ith group consisting of the 
observations yij, j = 1, . . . ,n. A dispersion test that is often of genetic in- 
terest is Hq: 7i = 710, which corresponds to Hq: /i^ = 47io/(l +710)) where 
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/i^ = 4cjf/((TQ + erf) is the heritability. Arvesen and Schmitz proposed use 
of the jackknife estimator with a transformation. Let 9 = log(l + 7in) and 
9 = log(MSA/MSE), where MSA and MSE are the between and within group 
mean squares. A test of Hq will be based on 

(31) t - V^(^jack - 0) 

\/0^-l)-^E™i(^-^jack)2' 

which is expected to have an asymptotic tm-i null distribution [1]. 

To make a fair comparison, we note that a test of type is omnibus 
rather than directional (e.g., [21], Section 1.1). In other words, a test 
is typically used in situations of two-sided hypotheses. On the other hand, 
a t-test is appropriate to both one- and two-sided hypotheses. Therefore, 
we consider testing Hq: 7i = 1 against Hi: 71 7^ 1. For each simulated data 
set, the test statistics (29) and (31) are computed. The simulated sizes that 
correspond to the usual nominal levels 0.01, 0.05 and 0.10 are reported in 
Table 1. Furthermore, the simulated powers at a number of alternatives, 
namely, 71 = 0.2,0.5,2,5, are reported in Tables 2-4. All results are based 
on 10,000 simulations. 

Overall, the jackknife appears to be more accurate in terms of the size, 
especially when m is relatively small (Case I). On the other hand, the sim- 
ulated powers for POQUIM are higher at all alternatives, especially when 
m is relatively small (Case I). However, it would be misleading to conclude 
that the POQUIM has higher power than the jackknife, because the power 
comparison is considered fair only if the two tests have similar sizes. In other 
words, for the case of m = 50, the higher power for POQUIM could be the 
result of the test overrejecting. Finally, note that the jackknife with the log- 
arithmic transformation is specifically designed for this kind of model where 
the observations are divided into independent groups, while the POQUIM is 
for a much richer class of mixed linear models where the observations may 
or may not be divided into independent groups, as we will see in the next 
simulated example. 

Table 1 
POQUIM versus jackknife — size 



Simulated size 



Nominal level 


Method 


I-i 


I-ii 


I-iii 


Il-i 


Il-ii 


Il-iii 


0.01 


POQUIM 


0.022 


0.026 


0.028 


0.011 


0.013 


0.015 




Jackknife 


0.010 


0.014 


0.020 


0.009 


0.011 


0.013 


0.05 


POQUIM 


0.070 


0.078 


0.091 


0.054 


0.057 


0.063 




Jackknife 


0.052 


0.053 


0.068 


0.053 


0.053 


0.060 


0.10 


POQUIM 


0.123 


0.132 


0.151 


0.106 


0.108 


0.114 




Jackknife 


0.099 


0.103 


0.122 


0.104 


0.103 


0.109 
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Table 2 

POQUIM versus jackknife — power (nominal level 0.01^ 



Simulated power 



Alternative 


Method 


I-i 


I-ii 


I-iii 


Il-i 


Il-ii 


Il-iii 


71 


= 0.2 


POQUIM 


0.506 


0.616 


0.468 


1.000 


1.000 


1.000 






Jackknife 


0.487 


0.463 


0.454 


1.000 


1.000 


1.000 


71 


= 0.5 


POQUIM 


0.112 


0.164 


0.122 


0.914 


0.891 


0.793 






Jackknife 


0.108 


0.121 


0.137 


0.921 


0.866 


0.787 


71 


= 2.0 


POQUIM 


0.354 


0.256 


0.221 


0.995 


0.971 


0.913 






Jackknife 


0.196 


0.118 


0.072 


0.993 


0.968 


0.887 


71 


= 5.0 


POQUIM 


0.991 


0.954 


0.900 


1.000 


1.000 


1.000 






Jackknife 


0.954 


0.876 


0.715 


1.000 


1.000 


1.000 



Table 3 

POQUIM versus jackknife — power (nominal level 0.05) 



Simulated power 



Alternative 


Method 


I-i 


I-ii 


I-iii 


Il-i 


Il-ii 


Il-iii 


71 


= 0.2 


POQUIM 


0.747 


0.807 


0.745 


1.000 


1.000 


1.000 






Jackknife 


0.728 


0.709 


0.668 


1.000 


1.000 


1.000 


71 


= 0.5 


POQUIM 


0.283 


0.336 


0.286 


0.980 


0.966 


0.917 






Jackknife 


0.277 


0.271 


0.275 


0.981 


0.958 


0.912 


71 


= 2.0 


POQUIM 


0.532 


0.424 


0.369 


0.999 


0.993 


0.973 






Jackknife 


0.411 


0.317 


0.223 


0.999 


0.993 


0.970 


71 


= 5.0 


POQUIM 


0.997 


0.984 


0.956 


1.000 


1.000 


1.000 






Jackknife 


0.991 


0.971 


0.903 


1.000 


1.000 


1.000 



6.2. A balanced two-way random effects model. We now consider the bal- 
anced two-way random effects model of Example 1, also discussed in Sec- 
tion 3.1. Consider testing the hypothesis Hq: af = (T2, or, equivalently, Hq: 
7i = 72 ; which means that the two random effect factors contribute equally 
to the total variation. It is easy to show that the test statistic (27) reduces 
to 

(32) x'- 



1 1 — 2Sr 12 + Sr, 
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where 71 and 72 are the REML estimators of 71 and 72, and Sr^^ is the j, k 
element of the POQUIM estimator Sr of the ACM of ^ = (A, 71, 71)', the 
REML estimator. Note that in this case there are no (fully) specified values 
of the parameters under the null hypothesis, although the latter may still be 
used in some way (but the difference is expected to be small in large samples; 
see the remark below Theorem 4). On the other hand, it is interesting to 



26 



J. JIANG 



Table 4 

POQUIM versus jackknife — power (nominal level 0.10) 



Simulated power 



Alternative 


Method 


I-i 


I-ii 


I-iii 


Il-i 


Il-ii 


Il-iii 


71 


= 0.2 


POQUIM 


0.844 


0.875 


0.807 


1.000 


1.000 


1.000 






Jackknife 


0.829 


0.810 


0.776 


1.000 


1.000 


1.000 


71 


= 0.5 


POQUIM 


0.405 


0.442 


0.396 


0.991 


0.983 


0.954 






Jackknife 


0.398 


0.382 


0.372 


0.991 


0.979 


0.950 


71 


= 2.0 


POQUIM 


0.633 


0.564 


0.462 


1.000 


0.997 


0.987 






Jackknife 


0.540 


0.453 


0.350 


1.000 


0.997 


0.986 


71 


= 5.0 


POQUIM 


0.999 


0.992 


0.975 


1.000 


1.000 


1.000 






Jackknife 


0.998 


0.988 


0.954 


1.000 


1.000 


1.000 



see how the test performs when the straight POQUIM estimator is used in 
the denominator of (32), and that is what we do in this simulation. Once 
again, we study the performance of the test under both moderate and large 
sample sizes, as well as departures from normality. The following sample 
size configurations are considered: Case I, m = 40, n = 40; Case II, m = 
200, n = 200. Furthermore, the following combinations of distributions for 
the random effects and errors are considered: Case i, v, w ^ Normal; Case 
ii, V, w ^ DE; Case iii, v ~ DE, w ~ CE; and Case iv, v, w ^ CE. In all 
cases, e ~ Normal. Note that the jackknife method discussed in the previous 
subsection does not apply to this case, because the observations cannot be 
divided into i.i.d. groups (or even independent groups). The true values of 
parameters are fx = = af = 1.0. 

As in the previous subsection, we first consider the size of the test, so we 
take o"2 = 1.0. The simulated sizes corresponding to the nominal levels 0.01, 
0.05 and 0.10 are reported in Table 5. Next we look at the powers at the 
following alternatives: al = 0.2, 0.5, 2, 5, which correspond to 72/71 = 0.2, 
0.5, 2, 5, respectively. The simulated powers are reported in Tables 6-8. 
Again, all results are based on 10,000 simulations. 

The numbers seem to follow the same pattern. As the sample size in- 
creases, the simulated sizes get closer to the nominal levels and the simu- 
lated powers increase significantly. There does not seem to be a difference, 
in terms of the size, across different distributions. However, the simulated 
powers appear significantly higher when all the distributions are normal as 
compared to other cases where the distributions of the random effects are 
nonnormal. Also, the powers are relatively low when the alternatives are 
close to the null (72/71 = 0.5 or 2.0), but much improved when the alterna- 
tives are further away (72/71 = 0.2 and 5.0). Overall, the simulation results 
are consistent with the theoretical findings of Theorem 4. 
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Table 5 
Simulated size 



Nominal level 


I-i 


I-ii 


I-iii 


I-iv 


Il-i 


Il-ii 


Il-iii 


Il-iv 


0.01 


0.014 


0.011 


0.014 


0.011 


0.011 


0.008 


0.011 


0.008 


0.05 


0.071 


0.061 


0.070 


0.066 


0.053 


0.051 


0.055 


0.048 


0.10 


0.135 


0.126 


0.139 


0.136 


0.108 


0.108 


0.109 


0.102 


Table 6 

Simulated power (nominal level 0.01) 


A Itpmati VP 


I-i 


I-ii 


I-iii 


I-iv 


Il-i 


Il-ii 


Il-iii 


Il-iv 


"yn / — 9 

72/ /I — v.z. 


0.955 


0.568 


0.551 


0.398 


1.000 


1.000 


0.999 


0.986 


72/71 = 0-5 


0.313 


0.100 


0.118 


0.073 


0.988 


0.684 


0.619 


0.439 


72/71 = 2-0 


0.324 


0.100 


0.070 


0.088 


0.988 


0.685 


0.459 


0.443 


72/71 = 5.0 


0.969 


0.649 


0.491 


0.497 


1.000 


0.999 


0.989 


0.992 






Table 7 
Simulated power (nominal I 


evel 0.05) 






Alternative 


I-i 


I-ii 


I-iii 


I-iv 


Il-i 


Il-ii 


Il-iii 


Il-iv 


72/71 = 0.2 


0.994 


0.864 


0.839 


0.713 


1.000 


1.000 


1.000 


0.999 


72/71 = 0.5 


0.579 


0.308 


0.321 


0.232 


0.998 


0.874 


0.819 


0.713 


72/71 = 2.0 


0.595 


0.305 


0.227 


0.256 


0.998 


0.879 


0.764 


0.717 


72/71 = 5.0 


0.997 


0.901 


0.799 


0.779 


1.000 


1.000 


1.000 


0.999 






Table 8 
Simulated power (nominal I 


evel 0.10) 






Alternative 


I-i 


I-ii 


I-iii 


I-iv 


Il-i 


Il-ii 


Il-iii 


Il-iv 


72/71 = 0.2 


0.999 


0.946 


0.923 


0.846 


1.000 


1.000 


1.000 


1.000 


72/71 = 0.5 


0.702 


0.456 


0.451 


0.364 


0.999 


0.931 


0.887 


0.818 


72/71 = 2.0 


0.719 


0.448 


0.359 


0.382 


0.999 


0.936 


0.864 


0.818 


72/71 = 5.0 


0.999 


0.955 


0.904 


0.879 


1.000 


1.000 


1.000 


1.000 



7. Discussion and remarks. A classic parametric statistical model as- 
sumes that the distribution of the data is fully determined by a vector of 
parameters. Under such a model, a maximum likelihood estimator of the 
vector of parameters is self-contained in the sense that the (asymptotic) co- 
variance matrix of the estimator does not involve any additional unknown 
parameter. In many cases, however, a model is not fully determined by a set 
of parameters. For example, under nonnormality, the distribution of the data 
is not determined by the mean and the variance. Obviously, in such cases 
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maximum likelihood does not apply, but a quasi-likelihood method may be 
used to estimate the parameters of direct interest (e.g., [11]). The prob- 
lem is that the estimator may no longer be self-contained. The POQUIM 
method provides a way to estimate the (asymptotic) covariance matrix of a 
maximum quasi-likelihood estimator and, hence, self-contains the latter. 

The general procedure of POQUIM is the following: Let l{9) be the quasi- 
log-likelihood. Then the ACM of 9, the maximum quasi-likelihood estimator, 
is J:=l2^IiI^^, where Ji = \w{dl/de) and = E(d'^l/de 89'). Usually, 
I2 either does not involve additional parameters or, if it does, at least it can 
be estimated by an observed form (e.g., by 8^1/89 39' with 9 replaced by 9). 
However, when the data are correlated, the matrix Ii cannot be estimated 
by an observed form. The idea of POQUIM is to express Ii as £(5"!) -|- S2 
such that £(5"!) involves parameters other than 9 but that can be estimated 
by an observed form (i.e., by Si with 9 replaced by 9), and 5*2 does not 
involve any additional parameters and therefore can be estimated by an 
estimated form (i.e., by S2 with 9 replaced by 9). In this paper this general 
method is applied to mixed linear models. 

In this paper we assume that the random effects and errors in a non- 
Gaussian mixed linear model are independent. This means that (i) the vec- 
tors ai, . . . ,as,e are independent and (ii) the components of aj {1< j < s) 
and e are independent. A mixed linear model that satisfies (1) as well as 
(i) and (ii) above is also known as an analysis of variance mixed (ANOVA) 
model (e.g., [5], where normality is assumed). Mixed ANOVA models are 
very popular in practice. On the other hand, there are also mixed linear 
models used in practice that involve dependent random effects or errors, 
such as the so-called longitudinal model (e.g., [5], [18]). For example, in Sec- 
tion 3.4 it may be reasonable to assume that the random intercept, a^, and 
slope, bi, which correspond to the same individual, are correlated. Note that, 
in cases of correlated random effects, the variance components are defined as 
the parameters involved in V, the covariance matrix of y, that involve corre- 
lations in addition to the variances. Furthermore, there are more additional 
parameters involved in the ACM of, say, the REML estimator. However, a 
possible POQUIM decomposition may still be obtained. For example, by (9), 
one can write 



cov 



^ (ii,...,i4)G5i 



+ ^ ^ -Sj j-i^^jji?^ 42,14 COv('Ujj'U22 , tiiglij^) 
(il,.. .,44)652 



= h+h, 

where ^2 are those indices such that cov {ui-^Ui^,Ui^Ui^) involve only the 
variance components and Si are those indexes such that cav{ui.^Ui^,Ui^Ui^) 
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involve additional parameters. If Ii can be further expressed as an expected 
value plus a term that depends only on the variance components, one has a 
potential POQUIM decomposition. 

The robust dispersion test derived in Section 5 is of type. As men- 
tioned in Section 6.1, tests are omnibus (e.g., [21], Section 1.1). However, 
a directional test could be obtained, using asymptotic normality of, say, the 
REML estimator § and the POQUIM estimator of its ACM. The only ex- 
ception is when 9 lies on the boundary of the parameter space, because, 
obviously, in this case 6 cannot be asymptotically normal if it is required 
to stay in the parameter space. However, for testing purposes one may re- 
lax the latter restriction, for example, by defining 9 as the solution to the 
REML equation, which may or may not be in the parameter space (see the 
remark above Theorem 2). With such a definition, asymptotic normality of 
9 may still hold, even if 9 is on the boundary of the parameter space. See, 
for example, [22]. Such a result may be used for testing, for example, that 
some of the variance components are zero. 

The conditions of Theorem 2 [more specifically, condition (iii)] imply the 
existence and consistency of the REML estimator. See the remarks below 
Theorem 2 and also those above Theorem 2 regarding the definition of the 
REML estimator. Of course, this is a large sample result, which does not 
guarantee the existence of the REML estimator in a finite sample situation, 
even under the normality assumption. A similar problem exists for the ML 
estimator as well. See, for example, [6] and [4]. 

8. Proofs and other technical details. 

8.1. Partial derivatives of the quasi-restricted log-likelihood. Differenti- 
ating (5) with respect to 9 and using the fact PVP = P, we have dl-^/dX = 
{y'Py - N + p)/2\ and dl^/d-fj = {\/2){y' PZ^Z'^Py - tr{PZjZ'^)}, l<j< 
s. Note that since PX = 0, the vector y in the above expressions can be re- 
placed hy u = y — Xp. Furthermore, we have E(9^/r/(?A^) = — {N — p)/2X^ , 
E{dHK/dXd^j) = -(l/2)tr(PZ,Zj), 1 < J < s, and E{d^ln/d^j d^k) = 
-iXyiMPZjZ'jPZkZ'f^), l<j,k<s. 

8.2. Proof of Lemma 1. First note Ui = J2t=o '^it with uu = Yl'i^i ZitlO-tV, 
hence 'Ei{ui^Ui^) = J2t=o^i'^htUi2t) = J2t=o(^t4it^i2t = XT{ii,i2). Next, we 
have E{ui-^ ...Ui^)= 'Eti,...,t4^iuhti ■ ■■Ui^u) and E{ui^ti ' ■■Ui^u) = unless 
(1) ti = • • • = ^4 or (2) the t's are in two pairs. Furthermore, under (1) 
we have E{ui^ti ' ' ' '^uu) = Y.h,...,h Zhth ' ' • Zi^tuE{ati^ ■ ■ ■ au^), where ti = 
■ ■ • = t4 = t. Again, E{au-^ • • • au^) = unless (1-1) h = • • • = U, (1-2) li = 
h ^ k = h, (1-3) h = k ^ h = k or (1-4) l^ = l^ ^ I2 = I3. It is easy 
to show that E(i-i) • • • = ^{(^ti)ziit ■ --Zut, E(i-2) • • ' = ^^t {(^nt ' ^j2t)(^«3t ' 



30 J. JIANG 



Zut) — Zi^t • • • Zi^t}, J2{i-3) • • • — (^tii^iit ■ Zi.^t)izi2t ■ Zi^t) — Zi^t ■ ■ ■ Zi^t} and 
E(i-4) • • • = f^t {(^«it • Zut){zi2t ■ Zi^t) - Zi^t ■ ■ ■ Zi^t}- It follows that 

(1) 

= ^KtZi^t---Zut 

t 

Zi4t)(,Zi2t ' Zigt) }• 
t t J 

Similarly, (2) has three cases: (2-1) ii = t2 / ^3 = ti, (2~2) ti = is / ^2 = ^3 
and (2-3) ii = ^4 7^ ^2 = ^3- Furthermore, we have 



E(tiijti •••^^14*4) = ^^s r(ii,i2)r(«3,i4) -J2^t(^iit ' ^«2t)( 

2-1) I t 

2-2) I t 

2-3) I t 



Z'i^t ' Z'i^t ) ( 5 



Zi2t ' Zi^t) ( ) 



Zi2t ' Zi'^t 



Therefore, in conclusion, we have cov {ui-^Ui2,Ui.^Ui^) = E(iijj • • • Uj^) — 
A^r(ii, i2)r(i3, 24) equal to the right-hand side of (6). 

Equation (7) is easily derived from (6), observing the first equation of (9). 

8.3. Proof of Theorem 2. In the following discussion A^r, ... represent 
sequences of matrices (vectors, numbers), but for notational simplicity we 
suppress the subscript N. Note that for a matrix B, B > means that B is 
positive definite. We first state and prove a lemma. 

Lemma 3. Let A, G be sequences of positive definite matrices such that 
G~^AG"^ — > i? > 0. Let A be another sequence of matrices. Then A~^I'^A x 
^4"^/^ — > /, the identity matrix, if and only if G~^{A — A)G~^ 0. 

Proof. Suppose that A'^/^ j;^-i/2 _^ j rj.^^^ 

we have 

G-\A - A)G-' = G~^A'/\A~'/'AA-'/^ - L)A^'^G-\ 
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Therefore, 

\\G-\A - A)G~^\\ < ||G-iAi/2||2p-V2i^-V2 _ j|| 

Now suppose that G'^iA - A)G-'^ 0. Let A = G'^AG-^ - B. Then 
we have 

^-1/2 A-^'^GBGA-^'^ - I 

+ A~'/^GB'/\B-'/^G~'AG~'B~'/^ - I)B'/^GA~'/^ 
= Di+D2. 

We have Di = A'^/^iGBG - A)A-^I'^ = -A'^I'^G^GA'^I'^; thus 
ll^ill < p-V2(^||2||A|| = Aniax(G^"^G)||A|| ^ 0, because GA-^G B-^ 
and A ^ 0. On the other hand, we have 

B-^I^G-^AG-^B-^I'^ -1 = B~^/^G~^AG-^B-^/^ - I 

+ B^^I'^G~^{A - A)G^^B~^/^ 0. 

Therefore, 

\\D2\\ < \\A-'/^GB'/Y\\B^'^^G-'AG-'B-^/^ - I\\ 

= Xn..^{B'/^GA-^GB'/^)\\B-^/^G-'AG-'B-'/'~ ~ I\\ — > 0. □ 

Recall that a sequence of matrices M is bounded from above if ||M|| is 
bounded; the sequence is bounded from below if ||M~^|| is bounded. 

Corollary 1. Let A, G be sequences of positive definite matrices such 
that G~^AG~^ is bounded from above as well as from below. Let A be a 
sequence of random matrices. Then A~'^l'^ AA~^I'^ L in probability if and 
only if G~^{A — A)G~^ — > in probability. 

Proof. A sequence of random matrices converges in probability if and 
only if for any subsequence, there is a further subsequence that converges 
almost surely (to the same limit). 

First assume that A~^^'^AA~^^'^ L in probability. For any subsequence 
of G~^{A — A)G~^, since the corresponding subsequence of G~^AG~^ is 
bounded, there is a further subsequence such that G~^AG~^ —>■ B for some 

> 0. The latter property is implied by the boundedness from below of the 
subsequence. Consider the corresponding further subsequence of A~^^'^AA~^^'^ 
Since it converges in probability, there is a further subsequence such that 
A~^^'^AA~^/^ L almost surely. It follows, by Lemma 1, that the corre- 
sponding further subsequence G~^{A — A)G~^ — > almost surely. 
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Next assume that G~^{A — A)G~^ — > in probability. By similar argu- 
ments we have for any subsequence of A^^^'^AA'^^^'^ that there is a further 
subsequence that I almost surely. □ 

The next lemma states that the Gaussian information matrix Q and the 
QUIM Ti are asymptotically of the same order. 

Lemma 4. Under condition (i) of Theorem 2, there are positive con- 
stants a and b such that aQ <Zi <hQ . 

Proof. First, it is easy to show that 

tl,t2=0 lj^=ll2=l 

Condition (i) of Theorem 2 implies that there is < < 1 such that Kt = 
vai{a^i) — 2af > 2{5 — l)af, <t < s. Thus, it can be shown by (7) that for 

any x = ixj)o<j<s, 

s mt f s ^ 2 

x'lix = x'gx + J2'^tJ2 \ ^^ii^'tiBjZti) \ 
t=0 1=1 lj=o J 



> Sx'Gx + 2(1 - 5) 



2 



tl,t2=0 li=ll2 = l {j=0 

t=0 1=1 {j=0 ) 

> Sx'Qx. 

On the other hand, condition (i) of Theorem 2 implies that there is M > 
such that K4 < 2Maf, <t < s. Thus, we have, similarly, 

s mt f s ~j 2 

x'lix < x'Gx + 2M E E i E ^^i^M 

t=0 1=1 I j=0 J 

<(\^M)x'Qx. □ 

Corollary 2. Under condition (i) of Theorem 2, G~^QG~^ is bounded 
from above as well as from below if and only if G~^IiG~^ is bounded from 
above as well as from below. 

Throughout the rest of the proof, c represents a positive constant whose 
value may be different in each occurrence. 
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First prove that Ii is consistent. According to condition (iii) and Corol- 
lary 2, the sequence is bounded from above as well as from below. 
Then, according to Corollary 1, it suffices to show that G~^{Ii —Ti)G~^ — > 
in probability. 

First consider the observed part. Let V = {\e-e\< 5, \(3-[3\< 5} {5 > 0) 
and Xi,ijfc = E^i Cj,k,iJ2f{n,...,u)=fi ' ' •'"m- I* is easy to show that, on V, 
\uh ■■■Ui4 -Uij •••UiJ <cSJ2t=i{yt + \xir\^) and \ui^ ■■■uul <cJ2t=iiyt + 
Ixj^l^). It follows that, on P, 

f{il,...,i4)=fi /(n,...,j4)=/i 

4 

<c{dj^k^i{6) + \cj^k,i\^} J2 J2(yt + \^^rf)- 

/(jl,...,i4)=/;r=l 

Conditions (i) and (ii) imply that E(y^) < c. Therefore, we have 
^{{9j9ky'^\Ii,i,jk -Ii,i,jk\'i-v} 

(33) < c{g,g,)-'j2{dj,kA^) + \cj,k,i\S} ^ E{E(2/t) + l^vl'} 

1=1 /(n,..,i4)=/,r=l 

I 1=1 1=1 } 

On the other hand, given M > 0, we have au = aui + ottn, where am = 

^tl'^(\ati\<M) -^{(^tl'^{\ati\<M)}- ThuS, 

s 

Ui = Y^ z'^at = Uii + Ui2, 
t=o 

where = J2t=o^h^i ^iti'^tin ^ = 1)2- Conditions (i) and (ii) imply that 
E«) < cJ2t=o\\zit\\i < c and Eiuj^) < ch'^{M), where b{M) = 
E?=oE(a^il(|a,i|>M)) ^ as M ^ oo. Write 

Ui^ ■■■Ui^- E(nii • • ""m) = Uiii ■ --Ui^i - E{ui^i ■ ■■Ui^i) + (•••)- E(- • •), 

where (• • •) is a sum of products with each product involving at least one 
Uir2 (r = I, ■ ■ ■ ,4:). It follows by Holder's inequality that |E(- • •)! < E| • • • | < 
cb{M). Therefore, we can write 

L 

il,l,jk—Il,l,jk = ^^Cj^k,l X! {Uiil ■ ■ ■ Ui^l — E{ui^l ■ ■ ■ Ui^l)} 

1=1 f{ii,-M)=h 
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E {(--o-Eo--)} 

'=1 f{ii,---M)=fl 
= Si + S2 

with E{\S2\/gjgk) <cb{M){gjgk)~'^Y.i=ih\cj,k,i\- Furthermore, we have 

L 

X E COv(-Uj^i • • •Mj^i,Mjgi • • -Migi). 

/(n,...,i4)=/i^,/(i5,---,«8)=/!2 

The nonzero components of dj^ + • — h (ii4 (dj is defined in the second para- 
graph of Section 2.2) correspond to the indexes of the random effects and 
errors involved in Uj-^i • • • ui^i. Thus, if {di^ + ■ ■ ■ + di^) ■ {di^ + • • • + djg) = 0, 
Mill • • • Ui^i and Ui^i • • • lijgi involve different random effects and errors, hence 
cov(tijji • • • Mj^i, Mjgi • • • Mjgi) = 0; otherwise, the covariance is bounded in ab- 
solute value by cM^ . It follows that 

L 

HSi/g,gkf <cM\gjgk)-^ E 

h,l2=l 

In conclusion, we have 

,l,jk — '^l,l,jk\} 
L 

(34) <c6(M)(5,-5fc)-iE^;|ci,Ml 

1=1 



L 

{djgky^ E ^h,l2\'^3,k,h^j,k,h\ 
\ li,l2=l 



Now consider the estimated part. We have 

^l,2,jk — ^l,2,jk 

= 2{ti{BjVBkV) - tT{BjVBkV)} 

L 

-2,)?^{cj^k,i- Cj^k,i) E f(ii,i3)f (22,44) 

'=1 f{ii,--;ii)=fi 
L 

-3A^E'=i,fc,' E {f(ii,«3)f (22,44) - r(n,i3)r(i2,Z4)} 

1=1 f(ii,-,u)=fl 
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L 

-3(A^ - A^)^Cj,fc,i r(ii,i3)r(z2,i4) 

^=1 f{ii,---M)=fi 

r=l 

On P we have |Ti| < gj^ki^)- Also, condition (ii) implies that, on D, IT2I < 
cJ2b=ihidj^k,iiS) and \Tr\ < c6J2b=i^i\'^j,k,i\, f = 3,4. Therefore, we have 

{9j9k) ^\il,2,jk - ^l,2,jk\ 

L 

(35) < {gjgk)~^gj^k{S) + c{gjgk)~^ ^ hdj^uA^) 

1=1 

L 

+ c5{gjgkY^Yhi\cj^k,i\ on P. 
1=1 

For any > and p > 0, first choose (5 > and M > such that the 
right-hand side of (33) is less than r]p/15, the right-hand side of (35) is less 
than ri/3 and the first term on the right-hand side of (34) is less than rjp/15, 
which one can do according to conditions (i), (ii), (iv) and (v). Then choose 
A^o such that, when N > Nq, we have P{T>^) < p/5, and the second term on 
the right-hand side of (34) is less than rjp/15, which one can do by conditions 
(iii) and (iv). Note that condition (iii) implies consistency of 9 and (3 (see 
the remarks below Theorem 2). It follows that, when N > Nq, by (33) and 
Chebyshev's inequality, 

p(i9j9k) — ^i,ijfc| > 

<p{i9j9k)-^\ii,i,jk -ii,ijk\iv > 1) +P(P'=) 

< -^{{9j9k)~'^\1i,i,jk -ii,i,jk\^v} + -<-p- 

Tj 

Similarly, by (34), P((5'j5'fe)"^|^i,ij/t - ^Ti.ijfcl > ??/3) < (2/5)p, and by (35), 
P{igj9k)~^\ii,2,jk — ^i,2j7c| > ??/3) < P(P'^) < p/5. Therefore, we conclude 
that when > Nq, the probability is greater than 1 — p that {gjgk)~^ x 

\^i,jk —Ii,jk\ < V- 

Now consider consistency of Sr- First note that T2 = — ^, which is nonsin- 
gular by condition (iii). Since 012^0 = {G-^gG-^)-^G-^IiG-HG-^gG-^)-\ 
by Corollary 2, GSrG is bounded from above as well as from below. Then, by 
Corollary 1 [note that G = (G^^)"^], it suffices to show that G(Sr-Sr)G 
in probability. 
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By condition (v) and consistency of 6, it is easy to show that G ^{Q — 
Q)G~'^ in probabihty, where ^ is ^ with 9 replaced by 9. Note that 
±2 = -Q. Also, since G-^QG-^ = G^^QG"^ + G-^{g - g)G~\ we have 
Amm(G-igG-i) > XminiG^^GG"^) - \\G-^ {G - g)G~^ \\, which is bounded 
away from zero with probability tending to 1. It follows that {G~^QG^^)~^ = 
Op(l). Furthermore, we have 

A = {G'^gG~^y^ - {G-^GG-^)-^ 
= -{G-^gG-^y^G-\g - g)G-\G-^gG-Y^ o 

in probability. Therefore, we have 

g(Sr - Sr)g = {G~^gG~^y^G^^iiG-^ A + AG-^iiG-\G~^gG~^y^ 
+ {G~^gG~^y^G~\ii-ii)G~\G~^gG~^y^ — 

in probability, using the results previously proved. 

8.4. Proof of Theorem 4. We have 

= {K'9 - ^)'{K'j:nKr\K'9 - ^) 

+ {K'9 - ^y{{K'±nKr' - {K'^kK)-^}{K'9 - ^) 
= X' + A. 

By (26), ^ Xri so it remains to show that A — > in probability. 

According to the definition in Section 2.2 (first paragraph) and the conclu- 
sion of Theorem 2, for any r/ > we have, with probability tending to 1 (here- 
after w.p. — > 1), (1 — rj)Ti-^ < < (1 + ??)5]r. It follows that w.p. ^ 1 (1 — 
r])K'T,KK < K't^K < (1 + r])K'T.^K and hence (1 + rj)~^{K'T.KKy^ < 
{K't^K)-^ < (1 -?7)-^(i^'SRK)-i (e.g., [20], Theorem A.52). Therefore, 
we have w.p. 1, 

(36) (1 + vr'lr < {K'Y.^Kf'^{K't^K)-\K'Y.^Kfl^ < (1 - r?)"^/,, 

which implies \\W - Ir\\ < {(1 - r/)~^ - 1} V {1 - (1 r])'^}, where W is 
the middle term in (36) and a V 6 = max(a, 6). Since r] is arbitrary, this 
proves that W Ir in probability. That A ^ in probability then follows 
by observing 

A = {K'9 - ^)'{K'i:^Kr^l\W - Ir){K'Y.^Kr^/\K'9 - 99), 
hence ||A|| < \\W - Ir\\{K'9 - ip)'{K'^KK)~^{K'9 - if) = \\W - Ir\\x^. 
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